Rule Induction for Click-Stream Analysis: Set Covering and Compositional Approach
نویسندگان
چکیده
We present a set covering algorithm and a compositional algorithm to describe sequences of www pages visits in click-stream data. The set covering algorithm utilizes the approach of rule specialization like the well known CN2 algorithm, the compositional algorithm is based on our original KEX algorithm, however both algorithms deal with sequences of events (visited pages) instead of sets of attribute-value pairs. The learned rules can be used to predict next page to be viewed by a user or to describe the most typical paths of www pages visitors and the dependencies among the www pages. We have successfully used both algorithms on real data from an internet shop and we mined useful information from the data.
منابع مشابه
The Use of Robust Factor Analysis of Compositional Geochemical Data for the Recognition of the Target Area in Khusf 1:100000 Sheet, South Khorasan, Iran
The closed nature of geochemical data has been proven in many studies. Compositional data have special properties that mean that standard statistical methods cannot be used to analyse them. These data imply a particular geometry called Aitchison geometry in the simplex space. For analysis, the dataset must first be opened by the various transformations provided. One of the most popular of the a...
متن کاملRule learning for classification based on neighborhood covering reduction
Rough set theory has been extensively discussed in the domain of machine learning and data mining. Pawlak’s rough set theory offers a formal theoretical framework for attribute reduction and rule learning from nominal data. However, this model is not applicable to numerical data, which widely exist in real-world applications. In this work, we extend this framework to numerical feature spaces by...
متن کاملA Local Branching Approach for the Set Covering Problem
The set covering problem (SCP) is a well-known combinatorial optimization problem. This paper investigates development of a local branching approach for the SCP. This solution strategy is exact in nature, though it is designed to improve the heuristic behavior of the mixed integer programming solver. The algorithm parameters are tuned by design of experiments approach. The proposed method is te...
متن کاملDiscovering New Rule Induction Algorithms with Grammar-based Genetic Programming
Rule induction is a data mining technique used to extract classification rules of the form IF (conditions) THEN (predicted class) from data. The majority of the rule induction algorithms found in the literature follow the sequential covering strategy, which essentially induces one rule at a time until (almost) all the training data is covered by the induced rule set. This strategy describes a b...
متن کاملApplication of continuous restricted Boltzmann machine to detect multivariate anomalies from stream sediment geochemical data, Korit, East of Iran
Anomaly separation using stream sediment geochemical data has an essential role in regional exploration. Many different techniques have been proposed to distinguish anomalous from study area. In this research, a continuous restricted Boltzmann machine (CRBM), which is a generative stochastic artificial neural network, was used to recognize the mineral potential area in Korit 1:100000 sheet, loc...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005